Cache Line Coloring Procedure Placement Using Real and Estimated Profiles
نویسندگان
چکیده
Efficient exploitation of the available cache memory space can have a significant improvement in program performance. By carefully restructuring a program such that temporally local sequences of instructions are mapped to different portions of the cache, fewer cache conflicts will result. In this paper we present a link-time procedure mapping algorithm which views the cache as a colored address space, each color representing a different cache line. The idea of coloring is borrowed from the register allocation problem. We consider the size of the cache and the size of a cache line in this work, when coloring procedures to indicate the current mapping of the procedure in the cache space. This algorithm takes as input a weighted program call graph, with nodes representing procedures and edge weights representing call frequencies. This graph is used as input to our color-based mapping algorithm to provide an improved program layout. The results in this paper are presented for reducing the first generation conflicts (mapping conflicts between a parent and a child). In this work we also describe a methodology of weighting a program call graph, without the aid of dynamic profiles, and use this to guide our coloring algorithm. We refer to this strategy as statically generating a call graph. We employ static branch prediction to estimate program behavior, which in turn is used to weight edges between procedures in the program call graph. Using profile-based weighted call graphs, our algorithm reduces on average the instruction cache miss rate by 40% over the original program mapping and by 17% over the mapping algorithm of Pettis and Hansen [21]. Using statically-formed program call graphs, our coloring algorithm improves performance by 20% on average over the original program mapping.
منابع مشابه
Code Reordering for Multi-level Cache Hierarchies
As the gap between memory and processor performance continues to grow, it becomes increasingly important to exploit cache memory eeectively. Both hardware and software techniques can be used to better utilize the cache. Many software solutions produce new programs layouts to better utilize the available memory and cache address space. In this paper we present a new link-time code reordering alg...
متن کاملAnalysis of Temporal-Based Program Behavior for Improved Instruction Cache Performance
ÐIn this paper, we examine temporal-based program interaction in order to improve layout by reducing the probability that program units will conflict in an instruction cache. In that context, we present two profile-guided procedure reordering algorithms. Both techniques use cache line coloring to arrive at a final program layout and target the elimination of first generation cache conflicts (i....
متن کاملEecient Procedure Mapping Using Cache Line Coloring
As the gap between memory and processor performance continues to widen, it becomes increasingly important to exploit cache memory e ectively. Both hardware and software approaches can be explored to optimize cache performance. Hardware designers focus on cache organization issues, including replacement policy, associativity, line size and the resulting cache access time. Software writers use va...
متن کاملMapping using Cache Line Coloring
As the gap between memory and processor performance continues to widen, it becomes increasingly important to exploit cache memory effectively. Both hardware and software approaches can be explored to optimize cache performance. Hardware designers focus on cache organization issues, including replacement policy, associativity, block size and the resulting cache access time. Software writers use ...
متن کاملMapping Using Static Call Graph Estimation
As the gap between memory and processor performance continues to grow, it becomes increasingly important to exploit cache memory e ectively. One technique used by compiler and linkers to improve the performance of the cache is code reordering. Code reordering optimizations rearrange a program so that sections of the program with temporal locality will be placed next to each other in the nal pro...
متن کامل